Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New scheduler for distribution of groups of related tests #191

Merged
merged 6 commits into from
Aug 3, 2017

Conversation

carlos-jenkins
Copy link
Contributor

@carlos-jenkins carlos-jenkins commented Jul 20, 2017

Hi, this PR is a possible fix of #18 (and its duplicate #84).

As stated in #18, current implementation of the LoadScheduler distributes tests without taking into account their relation. In particular, if the user is using a module level fixture that performs a large amount of work, distribution in this manner will trigger the fixture in each node, causing a large overhead.

In my case, we use the Topology Modular Framework to build a topology of nodes. When using the topology_docker builder, each module will start a set of Docker nodes, connect them, initialize them, configure them, and then the testing can start. When running on other builders, we have an even larger overhead for building and destroying the topology.

With this new scheduler, the tests will be aggregated by suite (basically anything before the :: in the nodeid of a test. I called this chunks of related tests "work units". The scheduler then will distribute complete work units to the workers, and thus, triggering the module level fixtures only once per worker.

We are running a test suite of more than 150 large tests in series that is taking 20 hours. We tried running it with xdist LoadScheduler at it took even more (30 and then we stopped it). With this change, we are able to scale the solution, and with just 4 workers we were able to reduce it to 5 hours.

I've included an example test suite using the loadsuite scheduler in examples/loadsuite that shows 3 simple suites (alpha, beta, gamma), any taking a total of 50 seconds to complete. And the results are as follows:

  • Serial: 150 seconds.
  • 3 workers: 50 seconds (one suite per worker).
  • 2 workers: 100 seconds (two suites in one workers, one suite in the other worker).
  • More than 3 workers: raise, not enough work.

This PR still requires a little bit of work, which I'm willing to do with your guidance. In particular:

  • I'm using OrderedDicts to keep track of ordering, but it isn't available on Python2.6. There is a PyPI package for Python 2.6 and below that includes it but I was hesitant to add it, or maybe conditionally add it for Python 2.6 only.
  • Include tests specific to the new scheduler (it currently works fine, but doesn't have a regression on it).
  • Not sure how to handle the len(workers) > len(suites). Currently we are just raising a RuntimeException.
  • The changelog thingy.
  • Document this new option of scheduler.
  • Any other feedback you may want to provide.

Thank you very much for your consideration of this PR.

@RonnyPfannschmidt
Copy link
Member

hi, i did a initial skim and this looks great, its a nice improvement 👍 and def should go insoon

we should spend a bit of time to discuss/prepare for different definitions of suite, as for some people it more important to partition based on a session scoped parameterized fixture for example

however currently xdist does not pass on parameter values and scopes when informing the master of tests - so this may turn out as a unnecessary/bad exercise

@nicoddemus
Copy link
Member

Definitely, thanks @carlos-jenkins for the PR.

I will take a closer look ASAP, possibly today.

Copy link
Member

@flub flub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good at first sight. Main thing I would say is that naming things is hard. pytest never really defined what a suit is and searching the current pytest.org for "suit" seems to suggest it gets used very differently. Internally pytest calls these collectors but when using --collect-only you see them as , etc so not really sure what to suggest.

Also, tests would be nice of course :)

@@ -791,3 +366,6 @@ def pytest_testnodedown(self, node, error):
# def pytest_xdist_rsyncfinish(self, source, gateways):
# targets = ", ".join(["[%s]" % gw.id for gw in gateways])
# self.write_line("rsyncfinish: %s -> %s" %(source, targets))


__all__ = ['DSession']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too keen on starting to add __all__ variables to xdist modules. These are all internal modules and maintain no outward API so I don't see what benefit this brings.

This comment obviously applies to everywhere __all__ is used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I don't think xdist is mature enough to promise any API other than what we promise through hooks.


class LoadSuiteScheduling:
"""
Implement load scheduling, but grouping by suite, across nodes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny irrelevant nitpicking, but this probably wants to be on the line above with the opening quotes as is done in the other modules. Other docstrings below as well.

Copy link
Contributor Author

@carlos-jenkins carlos-jenkins Jul 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the need of consistency in the codebase and be PEP 257 compliant. Will change it.

if log is None:
self.log = Producer('loadsuitesched')
else:
self.log = log.loadsuitesched
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure what object is supposed to be passed, but this looks a little suspicious to me. If you're confident it's right then never mind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not confident about it either. I just followed the structure of the others schedulers to be sure:

self.log = log.eachsched

self.log = log.loadsched

Documentation says:

 :log: A py.log.Producer instance.

Which is this one (and marked Deprecated):

https://github.com/pytest-dev/py/blob/44cf778ebe8c64178ba9c94e673ed474a26ab7bb/py/_log/log.py#L37

It seems that the . (__getattr__) triggers the creation of a new logger using the attribute name as argument to the constructor.

I'll prefer to leave it for consistency with the others schedulers.

@flub
Copy link
Member

flub commented Jul 21, 2017

Also, as the others said: great work and thanks for the PR!

@flub
Copy link
Member

flub commented Jul 21, 2017

As a follow up on the whole suit thing and Ronny's comment on it. Just to get some random (maybe useless) ideas started, how about considering a command-line option which takes a list of the collection IDs which should be run on one node. That is if you have a structure like pkg0.mod0.TestCls.test_meth etc specifying something like --load_collections=pkg0::mod0,pkg0::mod1,pkg1 would define 3 collections which will not be split up between different nodes. If there are any tests collected which do not have any of the given ID-prefixes then they'll end up anywhere we want (I'd use the lovely standardisation phrase "unspecified" while still guaranteeing they'll be run - but i'm sure people will start relying on whatever implementation we end up choosing first).

@RonnyPfannschmidt
Copy link
Member

we should makr this as experimental and break it if the need arises (reminds me that we should settle experimentation propperly)

Copy link
Member

@nicoddemus nicoddemus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All in all excellent work! 👍

I left some comments for discussion.

@@ -791,3 +366,6 @@ def pytest_testnodedown(self, node, error):
# def pytest_xdist_rsyncfinish(self, source, gateways):
# targets = ", ".join(["[%s]" % gw.id for gw in gateways])
# self.write_line("rsyncfinish: %s -> %s" %(source, targets))


__all__ = ['DSession']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I don't think xdist is mature enough to promise any API other than what we promise through hooks.

'load': LoadScheduling,
'loadsuite': LoadSuiteScheduling
}
return schedulers[dist](config, log)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should make this experiment as a separate plugin instead? pytest-xdist-loadsuite for example. The addition of pytest_xdist_make_scheduler was done exactly to allow this kind of experimentation. 😁

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you, no problem from my part either. I really think this would add value to the xdist core, but as you prefer :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it seems this is something everybody wants, so let's continue the discussion. Did you read the comments about using a marker to customize this behavior? Do you have an opinion on that?

self.numnodes = len(parse_spec_config(config))
self.collection = None

self.workqueue = OrderedDict()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the question regarding using OrderedDict vs Python 2.6: IMO it would be acceptable for py26 users to get an error message if they try to use this new scheduler: "loadsuite scheduler is not supported in Python 2.6" or something like this. I think this is acceptable because py26 users will still have access to this plugin and the other schedule modes, plus we probably will drop support for py26 somewhat soonish.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also another and simpler alternative is to depend on ordereddict from PyPI for py26 only as we did recently in pytest-dev/pytest#2617.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Modified the setup.py to add the dependency conditionally.

(...)
}

:assigned_work: Ordered dictionary that maps worker nodes with their
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great docs!

self.registered_collections[node] = list(collection)
return

# A new node has been added later, perhaps an original one died.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this happens, shouldn't we register the new node and its collection, like it is done in line 225 above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I mistranslated from the load scheduler. Will fix it.

self.assigned_work[node][suite][nodeid] = True
self._reschedule(node)

def _asign_work_unit(self, node):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: assign

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

assigned_to_node[suite] = work_unit

# Ask the node to execute the workload
worker_collection = self.registered_collections[node]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I'm missing something, but wouldn't be simpler to keep a single dict mapping nodeid to its completed status instead of each node has its own separate data structure? We already guarantee that all workers have collected the exact same items, so we can use that assumption to drop registered_collections. Less data structures to keep things in sync the better IMHO.

I think you can change self.collection into a dict of suite -> list of item ids.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the life cycle of this data structures is a little complex. In particular, registered_collections is filled on initialization, while the nodes start the structure is being constructed. On the other hand, collection acts as a Boolean to say that both the collection of tests in all nodes was completed and that the initial distribution of work as been done. You cannot overload the semantics of both into one.

In the particular line of code that you commented on we can change worker_collection = self.registered_collections[node] with self.collection without problem, both will be equivalent, but I preferred to grab the index from the node itself for correctness and to improve understanding that this needs to happen after the node was registered (independently if it was registered on initialization of later).

I think you can change self.collection into a dict of suite -> list of item ids.

We could, but because of what I wrote in the first paragraph you will still need a structure to build while initializing, and a second one / boolean to indicate initialization is done, and if you look there are other utilitarian uses of self.collection that justify its existence.

self._reschedule(node)
return

# XXX allow nodes to have different collections
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment can probably e removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


# Avoid having more workers than work
if len(self.workqueue) < len(self.nodes):
# FIXME: Can we handle this better?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we shutdown the nodes in excess instead of raising an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can I do that?

node.shutdown() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested calling shutdown on the extra nodes and it seems to be worked correctly:

72ac6a3#diff-5dfc52f66c10897e228ed2af4fc310ceR351

The nodes don't die until the suite has finished but it seems they die ok.

@nicoddemus
Copy link
Member

That is if you have a structure like pkg0.mod0.TestCls.test_meth etc specifying something like --load_collections=pkg0::mod0,pkg0::mod1,pkg1 would define 3 collections which will not be split up between different nodes. If there are any tests collected which do not have any of the given ID-prefixes then they'll end up anywhere we want

I'm not sure, this kind of information seems to me that it should be defined in the test themselves, not controlled by the command line.

How about we use a mark to group tests, for example @pytest.mark.xdist_group('foobar'), where 'foobar' is any arbitrary string which xdist will use to send tests marked with the same group to a single node? This would allow for arbitrary grouping by any criteria.

Thinking a little further, having the xdist_group mark working would allow one to implement pytest_collect_modifyitems which adds the xdist_group mark with the "suit" of each item, resulting in the same distribution this PR proposes but more general. This would of course need some more changes to the core, at the very least to transfer the xdist_group mark (if any) of each collected item back to the master node so it can properly schedule things.

@flub
Copy link
Member

flub commented Jul 26, 2017

You're entirely right, markers are a much better way to go about this.

@carlos-jenkins carlos-jenkins force-pushed the loadsuite branch 7 times, most recently from 337eadc to a7e68f9 Compare July 26, 2017 23:07
@carlos-jenkins
Copy link
Contributor Author

@flub

Seems good at first sight. Main thing I would say is that naming things is hard. pytest never really defined what a suit is and searching the current pytest.org for "suit" seems to suggest it gets used very differently.

Agree. "Suite" has different meanings depending on the context and the person you ask. I'll change the name of everything to loadgroup (?).

@nicoddemus

Thinking a little further, having the xdist_group mark working would allow one to implement pytest_collect_modifyitems which adds the xdist_group mark with the "suit" of each item, resulting in the same distribution this PR proposes but more general. This would of course need some more changes to the core, at the very least to transfer the xdist_group mark (if any) of each collected item back to the master node so it can properly schedule things.

Implementing grouping either using a command line option or markers put a lot of burden to the user.

Consider the options:

  1. Command line option:

This will imply to modify the tox file, jenkins file, or whatever triggers the execution of the tests each time a test is added. In our case, we have many colleagues working and adding tests everyday. Not only the command line option will become very very very large, it will become unmaintainable.

  1. Markers:

This will also imply that all users must add the @pytest.mark.xdist_group('name_of_module') for each test. Which is not only unnecessary, but can trigger maintainability problems (if for some reason somebody copy pasted, forgot to change, forgot to add it, etc).

In addition, we currently do not pass more than the list of collected tests between nodes, adding introspection to the tests to check if they are marked will imply a large change on the architecture of how xdist currently work, and doesn't add that much value.

So, in order to make it "generic" we are putting a lot of burden to the user.


Let's return back to what we are trying to solve. #18 list 3 use cases:

  • Use case: "I want all the tests in a module to go to the same worker."
  • Use case: "I want all the tests in a class to go the same worker."

The third I'll discuss it later. But the first two is basically the reuse of fixtures. Currently pytest have 4 scopes for fixtures:

https://docs.pytest.org/en/latest/fixture.html#sharing-a-fixture-across-tests-in-a-module-or-class-session

https://github.com/pytest-dev/pytest/blob/c92760dca8637251eb9e7b9ea4819b32bc0b8042/_pytest/fixtures.py#L848

scope="function"
scope="class"
scope="module"
scope="session"

Let's analyze them one by one:

  • session: Grouping by a session fixture makes no sense, as it will be the same as serial.
  • function: Grouping by a function fixture is the same as using the load scheduler. So we can mark it as supported.
  • module: This scheduler supports it.
  • class: This scheduler supports it (read below).

In consequence, grouping by fixture scope is completely supported with this scheduler.

So, what about the third use case?

  • Use case: "I want all the tests in X to go to the same worker."

Which is arbitrary grouping, and will then imply markers, that is what you propose. My first impression is, do we really need that? Is that a real use case? For what reasons do we need it, other than hypothetical? Plus #18 suggest a solution:

Solution: You know the drill. If tests belong on the same worker, we are betting that there is an implied, shared dependency. Express that shared dependency as a fixture and pytest will take care of the rest for you.

Which means that you must use a fixture to group them. A scoped fixture, either by class or module, thus basically implying that we need grouping with those scope. Which is what this PR implements.

I added to the example suite the test_delta.py module with two old xunit style classes, and the epsilon package with a single doctest. And collection of nodeids look like this:

example/loadsuite/test/test_alpha.py::test_alpha0
[...]
example/loadsuite/test/test_beta.py::test_beta0
[...]
example/loadsuite/test/test_delta.py::Delta1::test_delta0
[...]
example/loadsuite/test/test_delta.py::Delta2::test_delta0
[...]
example/loadsuite/test/test_gamma.py::test_gamma0
[...]
example/loadsuite/epsilon/__init__.py::epsilon.epsilon

Because this algorithm groups by the first :: Delta1 and Delta2 will be assigned to the same worker. This is a good enough solution as the class fixtures will be reused. Be could also change the algorithm to split not with the first :: but with the last, making classes complete work units.

My 2 cents.

Regards and thanks for comments.

@carlos-jenkins carlos-jenkins force-pushed the loadsuite branch 5 times, most recently from 8364dcf to da48e21 Compare July 27, 2017 01:13
@carlos-jenkins
Copy link
Contributor Author

Hi, I just pushed a version with corrections for all the feedback provided, even changing the name to loadscope. In particular I added a function that explains how the nodeids are grouped:

94c1425#diff-13d1cb017e0edf623ce2c3a8a49f9569R278

I changed the split to rsplit and in consequence classes are now scheduled on their own work unit.

With this, I think only two things are missing:

  • Tests.
  • Changelog and documentation.

Please , let me know if you want to move this implementation forward.

Thanks.

@nicoddemus
Copy link
Member

nicoddemus commented Jul 27, 2017

Hi @carlos-jenkins,

First of all thanks for the detailed response. We appreciate both your efforts on this PR as well as your patience on following up in the discussion! 👍

This will also imply that all users must add the @pytest.mark.xdist_group('name_of_module') for each test. Which is not only unnecessary, but can trigger maintainability problems (if for some reason somebody copy pasted, forgot to change, forgot to add it, etc).

You are absolutely right that forcing users to manually keep track of the marking is a huge maintainability problem. I should have mentioned that this marker can be done automatically by a simple hook:

def pytest_collect_modifyitems(items):
    for item in item:
        suite_name = item.module.fspath
        item.add_marker(pytest.mark.xdist_group(suite_name))

The same can easily be extended to group by class as well, or any other way a user needs.

Also, I'm not suggesting that users should implement this hook themselves, but by implementing the distribution scheduling in a more general way we can provide builtin options in xdist which use the idea of the hook above to set markers to items automatically, while allowing users to implement more complex scheduling themselves if they want to.

For example, we could create a command-line option which controls how it should group tests by default:

  • --group-by=no: no automatic grouping (default);
  • --group-by=module: automatically sets the xdist_group mark to the module name of the test item;
  • --group-by=class: automatically sets the xdist_group mark to the class name of the test item;

and so on.

In addition, we currently do not pass more than the list of collected tests between nodes, adding introspection to the tests to check if they are marked will imply a large change on the architecture of how xdist currently work

Indeed we would have to change some things on the architecture, but unless I'm mistaken they would be minimal changes: currently the workers send just a list of item ids to the master; to be able to handle the marker idea, we would have to send in addition to that a dict of "test id -> marker group name". The workers can easily obtain the value of the xdist_group marker after collection and before sending to master.

Now that I think more about it, we don't even need a separate scheduler, we might be able to change the default LoadScheduler to handle the markers idea. This would avoid having to duplicate some code.


session: Grouping by a session fixture makes no sense, as it will be the same as serial.

This is only true of the session fixture is used by all the tests, like an autouse fixture.

Here's an use case based on a session-scoped fixture:

Suppose you have a session fixture which is very costly to create (it starts up a server inside a dock container) but this session fixture is only used by a small portion of your tests (say 10%). In the current state of things, if you spawn 8 workers chances are that you will endup spinning 8 of those fixtures because the tests which use this fixture will most likely go to separate workers.

With the xdist_group mark idea, the user could implement a hook like this:

def pytest_collect_modifyitems(items):
    for item in item:
        if 'docker_server' in item.fixturenames:
            item.add_marker(pytest.mark.xdist_group('docker-server'))

So all tests which use this fixture (whenever they are defined) will go to the same worker, spinning up a single server.

Another example:

In a test suite a few tests (8 out of 400) take a very long time (8+ minutes). Those slow tests don't have much in common, they are slow because they are running some numerical simulations which due to their parameters do take a lot of time. When we initially put them in xdist, unfortunately it happened that a lot of those tests would end up in the same worker. For example, it might happen that 6 of those tests to land on the same worker because xdist currently is not deterministic when deciding which test goes where, so a single worker might end up taking 48min (8x6) to finish, meanwhile the other workers will have finished the tests long after that.

By marking them with a custom mark (say @pytest.mark.very_slow), we can assign groups to those tests in a round robin fashion.

def pytest_collect_modifyitems(items):
    numprocesses = items[0].config.numprocesses
    current_worker = 0
    for item in item:
        m = item.get_marker('very_slow')
        if m:
            item.add_marker(pytest.mark.xdist_group(f'worker-{current_worker}'))
            current_worker += 1
            if current_worker >= numprocesses:
                current_worker = 0

This will distribute evenly the slow tests across the available workers.

The examples above all came from work, and we had to implement ugly hacks to get around them.


@carlos-jenkins I hope I'm not coming across as trying to shoot down your efforts, I'm just trying to brainstorm with everyone involved here to see if perhaps we can get to a better/more general implementation.

Guys,

If we can't get to a consensus now (I mean everyone involved in the discussion), I would rather us work towards merging this PR as it stands because it will improve how things work currently. Also, I think it would be possible to implement this idea of using markers afterwards with some internal refactoring and also without breaking the behavior being introduced here.

Again @carlos-jenkins thanks for your patience!

@RonnyPfannschmidt
Copy link
Member

@carlos-jenkins thanks for your updates, in near future i#d like to get interested people together for changing the shape

currently imalready preparing to shed xdist from boxed/looponfails and setting up more comprehensible statemachines for xidst, afterwards i'd like to extend sheduling for per file, per fixture and similar details

@flub
Copy link
Member

flub commented Jul 30, 2017

Hello, somehow I don't find this outcome very satisfying. Here we had code that works right now and solves a sub-set of the problem, so can't we find a way to use it? We all have grand plans for many things but realistically also know they only get realised rarely.

If we where to add this code as-is to the core now we'd have a new scheduler type. We'd just need to be a little careful about naming it but it would work just fine. It is my understanding that if we then later add a mark-scheduler we do not have to do anything else and there would be no conflict. So I'm not actually sure there's any practical downside to adding this now at all. Did I overlook something?

An alternative approach which was mentioned I think and may not be too much extra work is to turn this scheduler into a plugin. I'd then update the standard documentation to point to this plugin and it would would be a great example for other people who want to customise their own scheduling. Again I've not looked into the practicalities of this, but what are the changes needed to turn this into a plugin?

Cheers

@flub flub reopened this Jul 30, 2017
@nicoddemus
Copy link
Member

Hello, somehow I don't find this outcome very satisfying.

I agree, it was not my intention to just shut down the PR at all.

I believe we can add this to the core and refactor things later to support the mark-based distribution idea.

Creating a separate plugin with this functionality is also possible I think.

@nicoddemus nicoddemus closed this Jul 30, 2017
@nicoddemus nicoddemus reopened this Jul 30, 2017
@nicoddemus
Copy link
Member

(Oops clicked on the wrong button)

@RonnyPfannschmidt
Copy link
Member

then lets flag it as experimental and merge ? @carlos-jenkins would you be ok with such an integration?

@carlos-jenkins
Copy link
Contributor Author

@RonnyPfannschmidt sure, for me this is fine. We can pin the hash of the merge in our requirements.txt so you feel free to change anything without breaking us.

@nicoddemus
Copy link
Member

Thanks @carlos-jenkins

Before release, how about we create a separate option --group-by, which is used when --dist is load? By default, group-by would be none which sends tests individually to workers, and group-by=module provides the functionality of this PR. Having a separate option would allow us to extend the option to support grouping by classes in the future.

So this:

pytest --dist=loadsuite ...

Becomes:

pytest --dist=load --group-by=module ...

What do you guys think?

@carlos-jenkins
Copy link
Contributor Author

I think that is perfect @nicoddemus. Just a detail, the current implementation groups by the last :: in the tests ids, so it will group modules of functions, or classes if modules of classes:

b0916f0#diff-13d1cb017e0edf623ce2c3a8a49f9569R278

@RonnyPfannschmidt
Copy link
Member

since this works right now i propose merging first as an experimental feature, then pushing changes on top

@nicoddemus
Copy link
Member

I think we should at least add some functional tests and a changelog entry presenting the feature before merging.

@nicoddemus
Copy link
Member

Just a detail, the current implementation groups by the last :: in the tests ids, so it will group modules of functions, or classes if modules of classes:

Oh right. Hmm not sure how to name it then, --group-by=modules-and-classes? 😁

Or we can leave as --group-by=suite if people think that's better.

But as @RonnyPfannschmidt mentioned, we can leave the bikeshedding of the option name to later.

@flub
Copy link
Member

flub commented Jul 31, 2017 via email

@nicoddemus
Copy link
Member

I'm not very keen on this. It makes more assumptions that the next
scheduler feature will want different kind of grouping. If we do this as
just a new scheduler name, e.g. --dist=loadbunch or so, then we don't
introduce anything new to any other part of xdist. If we later add another
grouping and it still makes sense to add --group-by then we can add it at
that point.

No problem in adding this later, I guess it is acceptable we break the UI later for an experimental feature. 👍

@flub
Copy link
Member

flub commented Jul 31, 2017 via email

@nicoddemus
Copy link
Member

@flub you are right, a separate --dist option doesn't necessarily map to a separate Schedular implementation, the same implementation can handle multiple --dist=loadxxx options. 👍

@nicoddemus
Copy link
Member

nicoddemus commented Aug 2, 2017

Hi guys,

I'm writing an acceptance test for this feature so we can get it in, but I'm not getting the results I was expecting.

Consider 3 test files, test_foo.py, test_foobar.py and test_bar.py, with the same contents:

import pytest
@pytest.mark.parametrize('i', range(10))
def test(i):
    pass

Running:

pytest --dist=loadscope -n2 -v

I get this output:

============================= test session starts =============================
platform win32 -- Python 3.6.0, pytest-3.1.2, py-1.4.34, pluggy-0.4.0 -- x:\pytest-xdist\.env36\scripts\python.exe
cachedir: ..\.cache
rootdir: X:\pytest-xdist, inifile: tox.ini
plugins: xdist-1.18.2.dev7+g144b37a.d20170727
[gw0] win32 Python 3.6.0 cwd: X:\pytest-xdist\.tmp
[gw1] win32 Python 3.6.0 cwd: X:\pytest-xdist\.tmp
[gw0] Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)]
[gw1] Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)]
gw0 [30] / gw1 [30]
scheduling tests via LoadScheduling

test_bar.py::test[1]
test_bar.py::test[0]
[gw1] PASSED test_bar.py::test[1]
[gw0] PASSED test_bar.py::test[0]
test_bar.py::test[3]
test_bar.py::test[2]
[gw1] PASSED test_bar.py::test[3]
[gw0] PASSED test_bar.py::test[2]
test_bar.py::test[4]
[gw0] PASSED test_bar.py::test[4]
test_bar.py::test[5]
[gw1] PASSED test_bar.py::test[5]
...
(snip the rest)

As can be seen, tests from test_bar.py are running in both gw0 and gw1. I was expecting for all tests from each module to run always in the same worker. Am I missing something? I will investigate this a little more when I have some time.

@carlos-jenkins
Copy link
Contributor Author

Hi @nicoddemus

Thanks for the effort on testing this.

I've noted that you're using -n2 which immediately implies using the LoadScheduling. You can actually read in the output:

scheduling tests via LoadScheduling

The correct way to call it is with:

pytest  --dist=loadscope --tx=2*popen -v

You should be able to read:

scheduling tests via LoadScopeScheduling

@nicoddemus
Copy link
Member

Rá, indeed! Thanks!

I will push my tests to your branch soon. 😁

@nicoddemus
Copy link
Member

Pushed two very basic functional tests for --dist=loadscope, and also made -nX and --dist=loadscope work properly together.

I didn't add any docs yet.

@nicoddemus
Copy link
Member

If you guys are OK with the implementation so far I guess we can merge it now that we have some regression tests in place.

@RonnyPfannschmidt RonnyPfannschmidt merged commit 2c66187 into pytest-dev:master Aug 3, 2017
@RonnyPfannschmidt
Copy link
Member

RonnyPfannschmidt commented Aug 3, 2017

@nicoddemus thanks for tieing up @carlos-jenkins fabulous work

we now need to decide how to document this as experimental and how to communicate/implement eventual changes on it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants